Skip to content

feat!: Add per-execution runId, at-most-once tracking, and cross-process tracker resumption#133

Open
jsonbailey wants to merge 22 commits intomainfrom
jb/aic-2207/update-ai-sdks-billing-spec
Open

feat!: Add per-execution runId, at-most-once tracking, and cross-process tracker resumption#133
jsonbailey wants to merge 22 commits intomainfrom
jb/aic-2207/update-ai-sdks-billing-spec

Conversation

@jsonbailey
Copy link
Copy Markdown
Contributor

@jsonbailey jsonbailey commented Apr 15, 2026

Summary

  • Per-execution runId: Every tracker now includes a unique runId (UUID) in all track event payloads, enabling billing isolation per execution
  • At-most-once semantics: Each metric type (duration, tokens, success/error, feedback, time-to-first-token) can only be tracked once per tracker instance — subsequent calls are silently dropped with a log warning
  • create_tracker() factory on config objects: AICompletionConfig, AIAgentConfig, and AIJudgeConfig now carry an optional create_tracker callable that returns a fresh LDAIConfigTracker with a new runId each time it's called. Set to None when the config is disabled.
  • Per-invocation trackers in managed classes: ManagedModel.invoke(), ManagedAgent.run(), and Judge.evaluate() now call create_tracker() at the start of each invocation to get a fresh tracker, fixing the multi-turn tracking issue where at-most-once guards blocked metrics from second+ invocations
  • resumption_token property on tracker: URL-safe Base64-encoded (no padding) JSON string containing {runId, configKey, variationKey, version} for cross-process tracker reconstruction
  • LDAIClient.create_tracker(token, context): Reconstructs a tracker from a resumption token for deferred feedback scenarios. Validates required fields and raises ValueError for invalid tokens.

Test plan

  • Enabled config has create_tracker callable; disabled config has None
  • Each create_tracker() call returns a new tracker with a distinct runId
  • Factory closure captures correct flag metadata (configKey, variationKey, version, modelName, providerName)
  • ManagedAgent.run() uses create_tracker() when available, falls back to stored tracker
  • Resumption token round-trip encode/decode preserves all fields
  • Resumption token has no base64 padding characters
  • create_tracker(token, context) reconstructs tracker with original runId and empty model/provider
  • Invalid base64, invalid JSON, and missing required fields all raise ValueError
  • All 137 existing + new tests pass with no regressions

🤖 Generated with Claude Code


Note

Medium Risk
Medium risk due to broad, breaking API changes to tracking/tracker plumbing across configs, managed wrappers, and graph runners; mistakes could cause missing or duplicated telemetry events.

Overview
Reworks tracking to be per-execution rather than per-config by replacing stored tracker instances with create_tracker() factories on AIConfig/AIAgentConfig/AICompletionConfig/AIJudgeConfig and AgentGraphDefinition, and updating LangChain/OpenAI graph runners and callback flushing to call these factories.

Adds a runId to every LDAIConfigTracker event and enforces at-most-once metric emission per tracker (duration, tokens, success/error, feedback, TTF), with warnings on duplicate attempts.

Introduces cross-process tracker resumption via LDAIConfigTracker.resumption_token + from_resumption_token, exposed as LDAIClient.create_tracker(token, context), and updates managed wrappers (ManagedModel, ManagedAgent, Judge, ManagedAgentGraph) to create fresh trackers per invocation/run; extensive tests are updated/added accordingly.

Reviewed by Cursor Bugbot for commit a41c7e3. Bugbot is set up for automated code reviews on this repo. Configure here.

@jsonbailey jsonbailey changed the title feat!: Add per-execution runId and at-most-once event tracking feat!: Add per-execution runId, at-most-once tracking, and cross-process tracker resumption Apr 15, 2026
jsonbailey and others added 2 commits April 16, 2026 11:03
- Each tracker now carries a runId (UUIDv4) included in all emitted
  events, scoping every metric to a single execution
- At-most-once semantics: duplicate calls to track_duration,
  track_tokens, track_success/track_error, track_feedback, and
  track_time_to_first_token on the same tracker are dropped with a
  warning

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ess tracker resumption

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@jsonbailey jsonbailey force-pushed the jb/aic-2207/update-ai-sdks-billing-spec branch from bdf7384 to 211ead4 Compare April 16, 2026 16:48
jsonbailey and others added 13 commits April 16, 2026 12:46
…osure

The run_id parameter on LDAIConfigTracker is now required (no default).
UUID generation happens in the tracker_factory closure in client.py,
keeping the tracker itself a plain data holder.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Break long tuple lines in client.py to stay under 120 char limit
- Add required run_id parameter to LDAIConfigTracker calls in
  openai and langchain provider tests

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Remove the redundant _tracked dict from LDAIConfigTracker. The summary
already stores each metric with None as the unset sentinel, so the
nil-check on summary properties serves as the at-most-once guard.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
New order: ld_client, run_id, config_key, variation_key, version,
model_name, provider_name, context, graph_key. All call sites
converted to keyword arguments for resilience against future reorders.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…oken

Reorder LDAIConfigTracker.__init__ to match updated spec: context now
comes before model_name and provider_name.

Also fix resumption_token to omit variationKey from the JSON when it is
empty, and handle the absent key when reconstructing from a token.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
All six at-most-once guard warnings in tracker.py now log the track data
dict (runId, configKey, etc.) to aid debugging duplicate-track scenarios.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Move the resumption token decoding logic from LDAIClient.create_tracker
into a classmethod on LDAIConfigTracker per spec 1.1.20.2. The client
method now delegates to LDAIConfigTracker.from_resumption_token.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Match the resumption token behavior: only include variationKey in the
track data dict when it has a non-empty value.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The create_tracker field on AIConfig is now always a callable that
returns a working tracker, even when the config is disabled. The
factory is always set to tracker_factory — callers use the enabled
flag to decide whether to proceed, not the factory result.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
BREAKING CHANGE: The `tracker` field has been removed from all config
dataclasses (AICompletionConfig, AIJudgeConfig, AIAgentConfig). Users
must now call `config.create_tracker()` to obtain a tracker instance.

ManagedModel and ManagedAgent no longer accept a tracker constructor
parameter — they call `create_tracker()` from the config on each
invocation. The `__evaluate` return tuple no longer includes a
pre-created tracker.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add graphKey to the resumption token following the spec key order:
runId, configKey, variationKey (if set), version, graphKey (if set).
The from_resumption_token classmethod now decodes and passes graphKey
to the tracker constructor.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Judge now calls self._ai_config.create_tracker() per evaluate()
invocation instead of receiving a tracker at construction time.
ManagedAgentGraph no longer stores or exposes a tracker.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace logging.getLogger(__name__) with the SDK's shared log instance
(from ldai import log) for consistency with the rest of the codebase.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Comment thread packages/sdk/server-ai/src/ldai/tracker.py Outdated
Migrate langchain and openai provider packages from config.tracker
to config.create_tracker() and fix test signatures to match.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@jsonbailey jsonbailey marked this pull request as ready for review April 17, 2026 16:56
@jsonbailey jsonbailey requested a review from a team as a code owner April 17, 2026 16:56
jsonbailey and others added 3 commits April 17, 2026 14:56
… factory

Per AIGRAPH spec 1.4.3, AgentGraphDefinition now has a create_tracker
callable that returns a new AIGraphTracker per invocation instead of
storing a pre-created instance. Removes get_tracker() method entirely.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
_flush_final_segment and _track_tool_calls were each calling
create_tracker() independently, generating new runIds that broke
per-execution event correlation. Now build_node creates one tracker
per node, cached in _node_trackers, and reused by all tracking methods.

Adds test_same_run_id_across_token_success_and_tool_call_events to
verify all node-level events for a single execution share one runId.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
run() and _build_agents() each called create_tracker() on the graph,
producing two tracker instances. Now run() creates the tracker once
and passes it to _build_agents() so handoff callbacks and run-level
tracking share the same instance.

Tests now assert graph.create_tracker is called exactly once per run
and node create_tracker is called exactly once per node.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
"providerName": self._provider_name,
}
if self._variation_key:
data["variationKey"] = self._variation_key
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

variationKey silently dropped from track event payloads

Medium Severity

The __get_track_data() method now conditionally includes variationKey only when self._variation_key is truthy. Previously, variationKey was always present in every track event payload (even as an empty string). When variation_key defaults to '' (e.g. from variation.get('_ldMeta', {}).get('variationKey', '') falling back), the field is entirely omitted from events sent to LaunchDarkly. Downstream consumers and backend analytics that expect variationKey to always be present in the event schema may break or misinterpret events.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 5313ce5. Configure here.

messages = self._construct_evaluation_messages(input_text, output_text)
assert self._evaluation_response_structure is not None

response = await self._ai_config_tracker.track_metrics_of_async(
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Judge evaluate crashes when create_tracker returns None

Low Severity

Judge.evaluate() calls self._ai_config.create_tracker() on line 73 and immediately uses the result on line 77 without a null check. The base AIConfig.create_tracker defaults to lambda: None. Previously, _initialize_judge in client.py guarded against a missing tracker with not judge_config.tracker, but that guard was removed (line 269). If a Judge is constructed with a config using the default factory, tracker is None and the track_metrics_of_async call raises an AttributeError — caught by the outer except, but producing an unhelpful error message.

Additional Locations (1)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 04f14eb. Configure here.

jsonbailey and others added 2 commits April 17, 2026 17:27
from_resumption_token and LDAIClient.create_tracker now return
ldclient.Result instead of raising ValueError on invalid tokens,
letting callers handle errors without try/except.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Change AgentGraphDefinition.create_tracker from
Callable[[], AIGraphTracker] with default lambda: None to
Optional[Callable[[], AIGraphTracker]] with default None. Guard
call sites in both runners with `is not None` before invoking.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The disabled() factory on AIConfigDefault and subclasses created
configs without tracker factories, breaking the spec requirement.
Replace with private module-level constants in client.py, matching
how js-core handles disabled configs as an internal concern.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 3 potential issues.

There are 5 total unresolved issues (including 2 from previous reviews).

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit a41c7e3. Configure here.

if not node:
continue
config_tracker = node.get_config().tracker
config_tracker = node.get_config().create_tracker()
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Callback handler creates new tracker per flush, losing runId correlation

Medium Severity

The flush() method calls node.get_config().create_tracker() for each node in the path. In production, this factory (from client.py) generates a new UUID runId each time it's called. This means if the same node's tracker was already obtained elsewhere during the run (e.g., for logging in _build_graph), the flush() creates an entirely separate tracker with a different runId. More importantly, if flush() were ever called more than once, each call would create new trackers with new runIds and emit duplicate events (the at-most-once guards are per-tracker-instance, and each call gets a fresh instance). The OpenAI runner avoids this by caching trackers in _node_trackers, but the LangGraph callback handler has no such caching.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit a41c7e3. Configure here.

self._graph.traverse(fn=handle_traversal)

tracker = self._graph.get_tracker()
tracker = self._graph.create_tracker() if self._graph.create_tracker is not None else None
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wasted graph tracker created solely for debug logging

Low Severity

_build_graph calls self._graph.create_tracker() and instantiates a full AIGraphTracker just to read its graph_key property for a debug log message. The graph key is readily available from self._graph._agent_graph.key without creating a tracker object. This adds an unnecessary side effect in a method that otherwise only builds the graph structure.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit a41c7e3. Configure here.

base64.urlsafe_b64decode(padded.encode("utf-8")).decode("utf-8")
)
except (json.JSONDecodeError, Exception) as e:
return Result.fail(f"Invalid resumption token: {e}", e)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Redundant exception type in except clause

Low Severity

The except (json.JSONDecodeError, Exception) clause is equivalent to just except Exception since json.JSONDecodeError is a subclass of Exception. Listing both suggests the intent was to catch only JSON and base64-specific errors, but the broad Exception catch masks that intent and silently swallows all exceptions, including unexpected ones.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit a41c7e3. Configure here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants